Masking repeats while clustering ESTs
نویسندگان
چکیده
A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. Unlike traditional methods, repeats are inferred directly from the EST data, we do not rely on any external library of known repeats. This makes the method especially suitable for analysing the ESTs from organisms without good repeat libraries. We demonstrate that the result is very similar to performing standard repeat masking before clustering.
منابع مشابه
RBR: library-less repeat detection for ESTs
MOTIVATION Repeat sequences in ESTs are a source of problems, in particular for clustering. ESTs are therefore commonly masked against a library of known repeats. High quality repeat libraries are available for the widely studied organisms, but for most other organisms the lack of such libraries is likely to compromise the quality of EST analysis. RESULTS We present a fast, flexible and libra...
متن کاملIn silico prediction of UTR repeats using clustered EST data
Clustering of EST data is a method for the non-redundant representation of an organisms transcriptome. During clustering of large amounts of EST data, usually some large clusters (>500 sequences) are created. Those can lead to iterative contig builds, consumation of lots of computing time and improbable exon alignments, which is unfavourable. In addition, these clusters sometimes contain transc...
متن کاملEGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments
Expressed sequence tag (EST) sequencing has proven to be an economically feasible alternative for gene discovery in species lacking a draft genome sequence. Ongoing large-scale EST sequencing projects feel the need for bioinformatics tools to facilitate uniform EST handling. This brings about a renewed importance for a universal tool for processing and functional annotation of large sets of EST...
متن کاملAlgorithms for the Analysis of Expressed Sequence Tags
A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time consuming process, and it depends on available repeat libraries. We present a fast and library independent method to eliminate this problem in the process of clustering. We demonstrate that the result is very similar to performing standard repeat masking before ...
متن کاملA Fast Clustering System for a Huge Number of Nucleotide Sequences
Single pass sequences of mRNA, called ESTs, have been determined extensively. They have been accumulated in the dbEST database in GenBank. The number of ESTs in dbEST has become more than eight million in August 2002. By clustering and assembling ESTs, we can conduct the following analyses. First, we can obtain complete ORF sequences based on ESTs that are fragment sequences of mRNA and do not ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nucleic Acids Research
دوره 33 شماره
صفحات -
تاریخ انتشار 2005